Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We study stochastic approximation procedures for approximately solving a $$d$$-dimensional linear fixed point equation based on observing a trajectory of length $$n$$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $$t_{\mathrm{mix}} \tfrac{d}{n}$$ on the squared error of the last iterate of a standard scheme, where $$t_{\mathrm{mix}}$$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $$(d, t_{\mathrm{mix}})$$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the TD($$\lambda$$) family of algorithms for all $$\lambda \in [0, 1)$$—and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $$\lambda$$ when running the TD($$\lambda$$) algorithm).more » « less
An official website of the United States government

Full Text Available